Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ship_traffic AIS example #130

Merged
merged 46 commits into from
Jan 27, 2021
Merged

Added ship_traffic AIS example #130

merged 46 commits into from
Jan 27, 2021

Conversation

jbednar
Copy link
Contributor

@jbednar jbednar commented Jan 19, 2021

This notebook/dashboard shows USA AIS vessel (typically ship) tracking data collected from marinecadastre, colored by vessel type:

Screen Shot 2021-01-18 at 7 58 33 PM

Screen Shot 2021-01-18 at 8 03 10 PM

Screen Shot 2021-01-18 at 8 04 50 PM

Screen Shot 2021-01-18 at 8 06 05 PM

Screen Shot 2021-01-18 at 8 06 53 PM

Apart from the actual data shown, which is interesting enough, the notebook also shows:

  • How to process a large set of CSV files efficiently, caching result to a fast and efficient Parquet file.
  • How to create and persist spatial indexing for SpatialPandas
  • How to plot categorical data using Datashader
  • How to select and highlight individual data points in a datashaded plot
  • How to drill down into a Datashaded plot to reveal metadata about each point not shown in the plot itself

To do items:

  • Add a series of static zoomed images like in this issue, to show data at each scale.
  • Highlight some interesting bits of the data
  • The spatial points selection code isn't currently working; I believe @jlstevens can replace it with some that he has.
  • I'm not sure the spatial indexing in general is up and running; it's not fast per zoom even if only zooming slightly more in.
  • After selecting a point, it would be useful to update a short but wide table of data below the plot to show that row of data in the original dataset
  • It would also be great to be able to use show additional metadata about each selected point:
    • We would first need to build a table with metadata about the vessels covered by this dataset, by reading the data and collecting the latest non-NaN value for ['imo', 'call_sign', 'vessel_name', 'vessel_type','length', 'width'] for each 'mmsi_id'. This data shouldn't vary per ping, and can be stored separately from the ping data.
    • Once a point has been selected, look up this vessel info in a separate table
  • It would also be great to show connected tracks:
    • Highlight a specific track to see how things connect, on top of a datashaded points plot
    • Datashade the whole set of pings, which might be slow and also may be confusing when it connects distant points -- may need to filter out gaps.
  • Maybe selection can be in one of several modes: select nearest (default; highlights one point with a circle and shows one row in a table), select nearest N (highlights several nearby points and shows a table with multiple rows, perhaps with hovering over one point highlighting that row in the table to disambiguate), select all points with same column value as the selected one (MMSI in this case; highlights all such points and shows a table with multiple rows (up to a max), same but connecting each point into a trajectory by datetime
  • Once info is available per point, need to collect interesting examples and talk about them in the notebook
  • Test deployed app - responsive like in the notebook?
  • Add dataset download info to anaconda-project.yml, putting the data (e.g. vessel metadata and Parquet file of points) on S3?

@jbednar jbednar added the wip Work in progress; do not merge label Jan 19, 2021
@jlstevens
Copy link
Contributor

Very nice! This data generates some very pretty plots.

How to select and highlight individual data points in a datashaded plot
How to drill down into a Datashaded plot to reveal metadata about each point not shown in the plot itself

I'm currently working on different approaches to do this. Right now I have one prototype (that needs to be better encapsulated) but I am also working on a more efficient approach which will probably be needed for this large dataset.

@jlstevens
Copy link
Contributor

jlstevens commented Jan 19, 2021

Together with @philippjfr, we identified the reason for the massive performance difference: you want to use pyarrow>=2 otherwise things are exceedingly slow (when reading parquet).

@philippjfr
Copy link
Contributor

Together with @philippjfr, we identified the reason for the massive performance difference: you want to use pyarrow>=2 otherwise things are exceedingly slow (when reading parquet).

Also writing the parquet file, it goes from 30 minutes -> 10 minutes, while reading goes from 10 minutes -> 10 seconds.

@jlstevens
Copy link
Contributor

I'm guessing the data is from this page and obtained using:

wget -np -r -nH -L --cut-dirs=3 https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2020/

And 49.3GB in size.

@jlstevens
Copy link
Contributor

In 136c358 I added a (slow! non spatially indexed!) prototype of the inspect_points operation using holoviz/holoviews#4796 to also make use of the changes in holoviz/holoviews#4792

@jlstevens
Copy link
Contributor

@jbednar Now you can use holoviz/holoviews#4794 which is branched off master instead of holoviz/holoviews#4796

@jbednar jbednar changed the title Added ship_tracking AIS example Added ship_traffic AIS example Jan 24, 2021
@jlstevens jlstevens merged commit adfb4d6 into master Jan 27, 2021
@jbednar jbednar removed the wip Work in progress; do not merge label Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants